Search CORE

50 research outputs found

Understanding Factors in Emotion Perception

Author: Potard Blaise
Saheer Lakshmi
Publication venue
Publication date: 19/12/2013
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Self-supervised approach for Urban Tree Recognition on Aerial Images

Author: Babu Saheer Lakshmi
Shahawy Mohamed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2021
Field of study

In the light of Artificial Intelligence aiding modern society in tackling climate change, this research looks at how to detect vegetation from aerial view images using deep learning models. This task is part of a proposed larger framework to build an eco-system to monitor air quality and the related factors like weather, transport, and vegetation, as the number of trees for any urban city in the world. The challenge involves building or adapting the tree recognition models to a new city with minimum or no labeled data. This paper explores self-supervised approaches to this problem and comes up with a system with 0.89 mean average precision on the Google Earth images for Cambridge city

Anglia Ruskin Research

HAL Descartes

Hal-Diderot

Speech Emotion Recognition Using Attention Model

Author: Faust Oliver
Saheer Lakshmi B.
Singh Jagjeet
Publication venue: 'MDPI AG'
Publication date: 14/03/2023
Field of study

Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, sad, angry, surprise, disgust, calm, fearful, and neutral) were detected. The proposed attention-based deep learning model achieved an average test accuracy rate of 90%, which is a substantial improvement over established models. Hence, this emotion detection model has the potential to improve automated mental health monitoring

Anglia Ruskin Research

Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

Author: Cernak Milos
Na Xingyu
Saheer Lakshmi
Publication venue: Idiap
Publication date: 19/10/2015
Field of study

Prosody plays an important role in both identification and synthesis of emotionalized speech. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windows of speech (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an emotional speech from the same user, it might be better to alter the pitch parameters at the suprasegmental level like at the syllable-level since the changes in the signal are more subtle and smooth. In this paper we aim to show that the pitch transformation in a neutral-to-emotional voice conversion system may result in a better speech quality output if the transformations are performed at the supra-segmental (syllable) level rather than a frame-level change. Subjective evaluation results are shown to demonstrate if the naturalness, speaker similarity and the emotion recognition tasks show any performance difference

Infoscience - École polytechnique fédérale de Lausanne

Speech recognition with speech synthesis models by marginalising over decision tree leaves

Author: Dines John
Liang Hui
Saheer Lakshmi
Publication venue: Idiap
Publication date: 11/02/2010
Field of study

There has been increasing interest in the use of unsupervised adaptation for the personalisation of text-to-speech (TTS) voices, particularly in the context of speech-to-speech translation. This requires that we are able to generate adaptation transforms from the output of an automatic speech recognition (ASR) system. An approach that utilises unified ASR and TTS models would seem to offer an ideal mechanism for the application of unsupervised adaptation to TTS since transforms could be shared between ASR and TTS. Such unified models should use a common set of parameters. A major barrier to such parameter sharing is the use of differing contexts in ASR and TTS. In this paper we propose a simple approach that generates ASR models from a trained set of TTS models by marginalising over the TTS contexts that are not used by ASR. We present preliminary results of our proposed method on a large vocabulary speech recognition task and provide insights into future directions of this work

Infoscience - École polytechnique fédérale de Lausanne

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

Author: Dines John
Garner Philip N.
Saheer Lakshmi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/12/2013
Field of study

Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors

Infoscience - École polytechnique fédérale de Lausanne

Study of Jacobian Normalization for VTLN

Author: Dines John
Garner Philip N.
Saheer Lakshmi
Publication venue: Idiap
Publication date: 26/08/2010
Field of study

The divergence of the theory and practice of vocal tract length normalization (VTLN) is addressed, with particular emphasis on the role of the Jacobian determinant. VTLN is placed in a Bayesian setting, which brings in the concept of a prior on the warping factor. The form of the prior, together with acoustic scaling and numerical conditioning are then discussed and evaluated. It is concluded that the Jacobian determinant is important in VTLN, especially for the high dimensional features used in HMM based speech synthesis, and difficulties normally associated with the Jacobian determinant can be attributed to prior and scaling

Infoscience - École polytechnique fédérale de Lausanne

Forest Terrain Identification using Semantic Segmentation on UAV Images

Author: Babu Saheer Lakshmi
Umar Muhammad
Zarrin Javad
Publication venue
Publication date: 23/07/2021
Field of study

Beavers' habitat is known to alter the terrain, providing biodiversity in the area, and recently their lifestyle is linked to climatic changes by reducing greenhouse gases levels in the region. To analyse the impact of beavers’ habitat on the region, it is, therefore, necessary to estimate the terrain alterations caused by beaver actions. Furthermore, such terrain analysis can also play an important role in domains like wildlife ecology, deforestation, land-cover estimations, and geological mapping. Deep learning models are known to provide better estimates on automatic feature identification and classification of a terrain. However, such models require significant training data. Pre-existing terrain datasets (both real and synthetic) like CityScapes, PASCAL, UAVID, etc, are mostly concentrated on urban areas and include roads, pathways, buildings, etc. Such datasets, therefore, are unsuitable for forest terrain analysis. This paper contributes, by providing a finely labelled novel dataset of forest imagery around beavers’ habitat, captured from a high-resolution camera on an aerial drone. The dataset consists of 100 such images labelled and classified based on 9 different classes. Furthermore, a baseline is established on this dataset using state-of-the-art semantic segmentation models based on performance metrics including Intersection Over Union (IoU), Overall Accuracy (OA), and F1 score

Anglia Ruskin Research

Urban Tree Species Classification Using Aerial Imagery

Author: Babu Saheer Lakshmi
Oghaz Mahdi M.
Waters Emily
Publication venue
Publication date: 07/07/2021
Field of study

Urban trees help regulate temperature, reduce energy consumption, improve urban air quality, reduce wind speeds, and mitigating the urban heat island effect. Urban trees also play a key role in climate change mitigation and global warming by capturing and storing atmospheric carbon-dioxide which is the largest contributor to greenhouse gases. Automated tree detection and species classification using aerial imagery can be a powerful tool for sustainable forest and urban tree management. Hence, This study first offers a pipeline for generating labelled dataset of urban trees using Google Map's aerial images and then investigates how state of the art deep Convolutional Neural Network models such as VGG and ResNet handle the classification problem of urban tree aerial images under different parameters. Experimental results show our best model achieves an average accuracy of 60% over 6 tree species

arXiv.org e-Print Archive

Anglia Ruskin Research

VTLN Adaptation for Statistical Speech Synthesis

Author: Dines John
Garner Philip N.
Liang Hui
Saheer Lakshmi
Publication venue: Idiap
Publication date: 01/01/2010
Field of study

The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from very little adaptation data and gives additive improvements over CMLLR adaptation. The challenge of estimating accurate warping factors using higher order features is solved by initializing warping factor estimation with the values calculated from lower order features

Infoscience - École polytechnique fédérale de Lausanne

Crossref